Lets do a short introduction to what we’re working with here.

Target Variable

  • \(I\) - Irrigation Percentage - Downloaded the data from Siebert et al. (2015) data set in ha/country/timestep, then by dividing by the countries total area, the fraction of total land area that is irrigated in a given country is created.
#original Siebert Data
aei <- read.csv2("/Volumes/RachelExternal/Thesis/Data/SiebertData/HID_v10/Country_tenyear_ha.csv")
aei <- as.data.frame(aei)

#rename the cols so that I can use them as number
colnames(aei) <- c("country", "ISO", "ID", "1900", "1910","1920","1930","1940","1950","1960","1970","1980","1985","1990","1995","2000","2005")

#tidying data to long format
aei <- pivot_longer(aei,cols = 4:17, names_to = "year", values_to = "aei_ha")
aei$aei_ha <- removePunctuation(aei$aei_ha)
aei$year <- as.numeric(aei$year)
aei$aei_ha <- as.numeric(aei$aei_ha)
aei$ISO <- as.factor(aei$ISO)
aei$country <- as.factor(aei$country)
#subset the data so that its from the year 1960
aei <- aei %>% subset(year >= 1960)

#make a new col for year count
aei$yearcount <- (aei$year - 1960)

#remove data for the total global irrigated area
globeaei <- subset(aei, country == "WORLD")
aei= aei[!aei$country == "WORLD",]

str(aei)
## tibble [1,848 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 232 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ ISO      : Factor w/ 232 levels "","ABW","AFG",..: 3 3 3 3 3 3 3 3 6 6 ...
##  $ ID       : int [1:1848] 2 2 2 2 2 2 2 2 6 6 ...
##  $ year     : num [1:1848] 1960 1970 1980 1985 1990 ...
##  $ aei_ha   : num [1:1848] 2404990 2408168 2518178 2594394 2903569 ...
##  $ yearcount: num [1:1848] 0 10 20 25 30 35 40 45 0 10 ...

Added to this to be able to calculate the % of land irrigated per country we need the country area. This data is taken from Worldbank.

#import the data set with total country area, this comes from the Worldbank
countryarea <- read.csv2("/Volumes/RachelExternal/Thesis/Data/LandArea/LandArea.csv")

#removing extra cols
countryarea <- countryarea[-c(265:270), ]
countryarea <- countryarea[, -c(1, 3:4)]

#fix the col names (in the two days since i found this little for loop I learned how to do this faster!)
for ( col in 2:ncol(countryarea)){
    colnames(countryarea)[col] <-  sub("X", "", colnames(countryarea)[col])
}
remove(col)

#making long data
countryarea <- 
  countryarea %>%  
  pivot_longer(!Country.Code, names_to = "year", values_to = "area_km") %>% 
  rename(ISO = Country.Code)

#fixing variable types
countryarea$ISO <- as.factor(countryarea$ISO)
countryarea$year <- as.numeric(countryarea$year)
str(countryarea)
## tibble [16,104 × 3] (S3: tbl_df/tbl/data.frame)
##  $ ISO    : Factor w/ 264 levels "ABW","AFG","AGO",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year   : num [1:16104] 1960 1961 1962 1963 1964 ...
##  $ area_km: num [1:16104] NA 180 180 180 180 180 180 180 180 180 ...

For some reason, the data from Worldbank on each countries area only starts in 1961. For this analysis, 1960 is needed. Country area from 1961 will be substituted for 1960, with the assumption that country area doesn’t change drastically in one year (it doesn’t).

head(subset(countryarea, c(year == 1960 | year == 1961))) 
## # A tibble: 6 x 3
##   ISO    year area_km
##   <fct> <dbl>   <dbl>
## 1 ABW    1960      NA
## 2 ABW    1961     180
## 3 AFG    1960      NA
## 4 AFG    1961  652860
## 5 AGO    1960      NA
## 6 AGO    1961 1246700
#fixing 1960
countryarea <-
  countryarea %>% 
  group_by(ISO) %>% 
  fill(area_km, .direction = "up") %>% 
  ungroup()

Merge the country area with the AEI df here.

#combine these data sets
aei <- aei %>% merge(y = countryarea, by.x = c("ISO", "year"), all.x = TRUE)

rm(countryarea)
head(aei)
##   ISO year country ID aei_ha yearcount area_km
## 1 ABW 1960   Aruba  1      0         0     180
## 2 ABW 1970   Aruba  1      0        10     180
## 3 ABW 1980   Aruba  1      0        20     180
## 4 ABW 1985   Aruba  1      0        25     180
## 5 ABW 1990   Aruba  1      0        30     180
## 6 ABW 1995   Aruba  1      0        35     180

Some countries also have no area value. I’ll go through and see if any of the countries with missing area values have non zero AEI_ha. If so, I’ll replace them, as they didn’t come through with the WorldBank data set but are online. Copying and pasting from various sources should be fine.

#checking for countries with NA values in their area_km col
#looking primarily for countries that have non 0 AEI_ha
aei[is.na(aei$area_km),]
##      ISO year                        country  ID  aei_ha yearcount area_km
## 25   AIA 1960                       Anguilla   4       0         0      NA
## 26   AIA 1970                       Anguilla   4       0        10      NA
## 27   AIA 1980                       Anguilla   4       0        20      NA
## 28   AIA 1985                       Anguilla   4       0        25      NA
## 29   AIA 1990                       Anguilla   4       0        30      NA
## 30   AIA 1995                       Anguilla   4       0        35      NA
## 31   AIA 2000                       Anguilla   4       0        40      NA
## 32   AIA 2005                       Anguilla   4       0        45      NA
## 49   ANT 1960           Netherlands Antilles   8       0         0      NA
## 50   ANT 1970           Netherlands Antilles   8       0        10      NA
## 51   ANT 1980           Netherlands Antilles   8       0        20      NA
## 52   ANT 1985           Netherlands Antilles   8       0        25      NA
## 53   ANT 1990           Netherlands Antilles   8       0        30      NA
## 54   ANT 1995           Netherlands Antilles   8       0        35      NA
## 55   ANT 2000           Netherlands Antilles   8       0        40      NA
## 56   ANT 2005           Netherlands Antilles   8       0        45      NA
## 89   ATA 1960                     Antarctica  13       0         0      NA
## 90   ATA 1970                     Antarctica  13       0        10      NA
## 91   ATA 1980                     Antarctica  13       0        20      NA
## 92   ATA 1985                     Antarctica  13       0        25      NA
## 93   ATA 1990                     Antarctica  13       0        30      NA
## 94   ATA 1995                     Antarctica  13       0        35      NA
## 95   ATA 2000                     Antarctica  13       0        40      NA
## 96   ATA 2005                     Antarctica  13       0        45      NA
## 265  BVT 1960                  Bouvet Island  36       0         0      NA
## 266  BVT 1970                  Bouvet Island  36       0        10      NA
## 267  BVT 1980                  Bouvet Island  36       0        20      NA
## 268  BVT 1985                  Bouvet Island  36       0        25      NA
## 269  BVT 1990                  Bouvet Island  36       0        30      NA
## 270  BVT 1995                  Bouvet Island  36       0        35      NA
## 271  BVT 2000                  Bouvet Island  36       0        40      NA
## 272  BVT 2005                  Bouvet Island  36       0        45      NA
## 297  CCK 1960                  Cocos Islands  42       0         0      NA
## 298  CCK 1970                  Cocos Islands  42       0        10      NA
## 299  CCK 1980                  Cocos Islands  42       0        20      NA
## 300  CCK 1985                  Cocos Islands  42       0        25      NA
## 301  CCK 1990                  Cocos Islands  42       0        30      NA
## 302  CCK 1995                  Cocos Islands  42       0        35      NA
## 303  CCK 2000                  Cocos Islands  42       0        40      NA
## 304  CCK 2005                  Cocos Islands  42       0        45      NA
## 361  COK 1960                   Cook Islands  50       0         0      NA
## 362  COK 1970                   Cook Islands  50       0        10      NA
## 363  COK 1980                   Cook Islands  50       0        20      NA
## 364  COK 1985                   Cook Islands  50       0        25      NA
## 365  COK 1990                   Cook Islands  50       0        30      NA
## 366  COK 1995                   Cook Islands  50       0        35      NA
## 367  COK 2000                   Cook Islands  50       0        40      NA
## 368  COK 2005                   Cook Islands  50       0        45      NA
## 409  CXR 1960               Christmas Island  56       0         0      NA
## 410  CXR 1970               Christmas Island  56       0        10      NA
## 411  CXR 1980               Christmas Island  56       0        20      NA
## 412  CXR 1985               Christmas Island  56       0        25      NA
## 413  CXR 1990               Christmas Island  56       0        30      NA
## 414  CXR 1995               Christmas Island  56       0        35      NA
## 415  CXR 2000               Christmas Island  56       0        40      NA
## 416  CXR 2005               Christmas Island  56       0        45      NA
## 513  ESH 1960                 Western Sahara  69       0         0      NA
## 514  ESH 1970                 Western Sahara  69       0        10      NA
## 515  ESH 1980                 Western Sahara  69       0        20      NA
## 516  ESH 1985                 Western Sahara  69       0        25      NA
## 517  ESH 1990                 Western Sahara  69       0        30      NA
## 518  ESH 1995                 Western Sahara  69       0        35      NA
## 519  ESH 2000                 Western Sahara  69       0        40      NA
## 520  ESH 2005                 Western Sahara  69       0        45      NA
## 561  FLK 1960               Falkland Islands  75       0         0      NA
## 562  FLK 1970               Falkland Islands  75       0        10      NA
## 563  FLK 1980               Falkland Islands  75       0        20      NA
## 564  FLK 1985               Falkland Islands  75       0        25      NA
## 565  FLK 1990               Falkland Islands  75       0        30      NA
## 566  FLK 1995               Falkland Islands  75       0        35      NA
## 567  FLK 2000               Falkland Islands  75       0        40      NA
## 568  FLK 2005               Falkland Islands  75       0        45      NA
## 641  GLP 1960                     Guadeloupe  86     889         0      NA
## 642  GLP 1970                     Guadeloupe  86    2000        10      NA
## 643  GLP 1980                     Guadeloupe  86    2000        20      NA
## 644  GLP 1985                     Guadeloupe  86    2000        25      NA
## 645  GLP 1990                     Guadeloupe  86    2000        30      NA
## 646  GLP 1995                     Guadeloupe  86    2000        35      NA
## 647  GLP 2000                     Guadeloupe  86    6000        40      NA
## 648  GLP 2005                     Guadeloupe  86    6454        45      NA
## 705  GUF 1960                  French Guiana  94    1000         0      NA
## 706  GUF 1970                  French Guiana  94    1000        10      NA
## 707  GUF 1980                  French Guiana  94    1000        20      NA
## 708  GUF 1985                  French Guiana  94    2000        25      NA
## 709  GUF 1990                  French Guiana  94    2000        30      NA
## 710  GUF 1995                  French Guiana  94    3500        35      NA
## 711  GUF 2000                  French Guiana  94    4800        40      NA
## 712  GUF 2005                  French Guiana  94    5608        45      NA
## 729  HMD 1960     Heard and McDonald Islands  98       0         0      NA
## 730  HMD 1970     Heard and McDonald Islands  98       0        10      NA
## 731  HMD 1980     Heard and McDonald Islands  98       0        20      NA
## 732  HMD 1985     Heard and McDonald Islands  98       0        25      NA
## 733  HMD 1990     Heard and McDonald Islands  98       0        30      NA
## 734  HMD 1995     Heard and McDonald Islands  98       0        35      NA
## 735  HMD 2000     Heard and McDonald Islands  98       0        40      NA
## 736  HMD 2005     Heard and McDonald Islands  98       0        45      NA
## 785  IOT 1960 British Indian Ocean Territory 106       0         0      NA
## 786  IOT 1970 British Indian Ocean Territory 106       0        10      NA
## 787  IOT 1980 British Indian Ocean Territory 106       0        20      NA
## 788  IOT 1985 British Indian Ocean Territory 106       0        25      NA
## 789  IOT 1990 British Indian Ocean Territory 106       0        30      NA
## 790  IOT 1995 British Indian Ocean Territory 106       0        35      NA
## 791  IOT 2000 British Indian Ocean Territory 106       0        40      NA
## 792  IOT 2005 British Indian Ocean Territory 106       0        45      NA
## 1129 MSR 1960                     Montserrat 154       0         0      NA
## 1130 MSR 1970                     Montserrat 154       0        10      NA
## 1131 MSR 1980                     Montserrat 154       0        20      NA
## 1132 MSR 1985                     Montserrat 154       0        25      NA
## 1133 MSR 1990                     Montserrat 154       0        30      NA
## 1134 MSR 1995                     Montserrat 154       0        35      NA
## 1135 MSR 2000                     Montserrat 154       0        40      NA
## 1136 MSR 2005                     Montserrat 154       0        45      NA
## 1137 MTQ 1960                     Martinique 155    1000         0      NA
## 1138 MTQ 1970                     Martinique 155    1000        10      NA
## 1139 MTQ 1980                     Martinique 155    5000        20      NA
## 1140 MTQ 1985                     Martinique 155    4000        25      NA
## 1141 MTQ 1990                     Martinique 155    4000        30      NA
## 1142 MTQ 1995                     Martinique 155    5000        35      NA
## 1143 MTQ 2000                     Martinique 155    7000        40      NA
## 1144 MTQ 2005                     Martinique 155    6407        45      NA
## 1193 NFK 1960                 Norfolk Island 163       0         0      NA
## 1194 NFK 1970                 Norfolk Island 163       0        10      NA
## 1195 NFK 1980                 Norfolk Island 163       0        20      NA
## 1196 NFK 1985                 Norfolk Island 163       0        25      NA
## 1197 NFK 1990                 Norfolk Island 163       0        30      NA
## 1198 NFK 1995                 Norfolk Island 163       0        35      NA
## 1199 NFK 2000                 Norfolk Island 163       0        40      NA
## 1200 NFK 2005                 Norfolk Island 163       0        45      NA
## 1217 NIU 1960                           Niue 166       0         0      NA
## 1218 NIU 1970                           Niue 166       0        10      NA
## 1219 NIU 1980                           Niue 166       0        20      NA
## 1220 NIU 1985                           Niue 166       0        25      NA
## 1221 NIU 1990                           Niue 166       0        30      NA
## 1222 NIU 1995                           Niue 166       0        35      NA
## 1223 NIU 2000                           Niue 166       0        40      NA
## 1224 NIU 2005                           Niue 166       0        45      NA
## 1289 PCN 1960               Pitcairn Islands 175       0         0      NA
## 1290 PCN 1970               Pitcairn Islands 175       0        10      NA
## 1291 PCN 1980               Pitcairn Islands 175       0        20      NA
## 1292 PCN 1985               Pitcairn Islands 175       0        25      NA
## 1293 PCN 1990               Pitcairn Islands 175       0        30      NA
## 1294 PCN 1995               Pitcairn Islands 175       0        35      NA
## 1295 PCN 2000               Pitcairn Islands 175       0        40      NA
## 1296 PCN 2005               Pitcairn Islands 175       0        45      NA
## 1393 REU 1960                        Reunion 188    2930         0      NA
## 1394 REU 1970                        Reunion 188    5000        10      NA
## 1395 REU 1980                        Reunion 188    5000        20      NA
## 1396 REU 1985                        Reunion 188    8000        25      NA
## 1397 REU 1990                        Reunion 188   11000        30      NA
## 1398 REU 1995                        Reunion 188   12000        35      NA
## 1399 REU 2000                        Reunion 188   12000        40      NA
## 1400 REU 2005                        Reunion 188    9722        45      NA
## 1433 SDN 1960                          Sudan 193 1387500         0      NA
## 1434 SDN 1970                          Sudan 193 1625000        10      NA
## 1435 SDN 1980                          Sudan 193 1700000        20      NA
## 1436 SDN 1985                          Sudan 193 1763000        25      NA
## 1437 SDN 1990                          Sudan 193 1800000        30      NA
## 1438 SDN 1995                          Sudan 193 1946000        35      NA
## 1439 SDN 2000                          Sudan 193 1863000        40      NA
## 1440 SDN 2005                          Sudan 193 1863000        45      NA
## 1457 SHN 1960                   Saint Helena 197       0         0      NA
## 1458 SHN 1970                   Saint Helena 197       0        10      NA
## 1459 SHN 1980                   Saint Helena 197       0        20      NA
## 1460 SHN 1985                   Saint Helena 197       0        25      NA
## 1461 SHN 1990                   Saint Helena 197       0        30      NA
## 1462 SHN 1995                   Saint Helena 197       0        35      NA
## 1463 SHN 2000                   Saint Helena 197       0        40      NA
## 1464 SHN 2005                   Saint Helena 197       0        45      NA
## 1465 SJM 1960         Svalbard and Jan Mayen 198       0         0      NA
## 1466 SJM 1970         Svalbard and Jan Mayen 198       0        10      NA
## 1467 SJM 1980         Svalbard and Jan Mayen 198       0        20      NA
## 1468 SJM 1985         Svalbard and Jan Mayen 198       0        25      NA
## 1469 SJM 1990         Svalbard and Jan Mayen 198       0        30      NA
## 1470 SJM 1995         Svalbard and Jan Mayen 198       0        35      NA
## 1471 SJM 2000         Svalbard and Jan Mayen 198       0        40      NA
## 1472 SJM 2005         Svalbard and Jan Mayen 198       0        45      NA
## 1497 SMK 1960     Serbia, Montenegro, Kosovo 400  142175         0      NA
## 1498 SMK 1970     Serbia, Montenegro, Kosovo 400  157450        10      NA
## 1499 SMK 1980     Serbia, Montenegro, Kosovo 400  176250        20      NA
## 1500 SMK 1985     Serbia, Montenegro, Kosovo 400  173900        25      NA
## 1501 SMK 1990     Serbia, Montenegro, Kosovo 400  183364        30      NA
## 1502 SMK 1995     Serbia, Montenegro, Kosovo 400  125960        35      NA
## 1503 SMK 2000     Serbia, Montenegro, Kosovo 400   66663        40      NA
## 1504 SMK 2005     Serbia, Montenegro, Kosovo 400  165426        45      NA
## 1521 SPM 1960        St. Pierre and Miquelon 205       0         0      NA
## 1522 SPM 1970        St. Pierre and Miquelon 205       0        10      NA
## 1523 SPM 1980        St. Pierre and Miquelon 205       0        20      NA
## 1524 SPM 1985        St. Pierre and Miquelon 205       0        25      NA
## 1525 SPM 1990        St. Pierre and Miquelon 205       0        30      NA
## 1526 SPM 1995        St. Pierre and Miquelon 205       0        35      NA
## 1527 SPM 2000        St. Pierre and Miquelon 205       0        40      NA
## 1528 SPM 2005        St. Pierre and Miquelon 205       0        45      NA
## 1633 TKL 1960                        Tokelau 220       0         0      NA
## 1634 TKL 1970                        Tokelau 220       0        10      NA
## 1635 TKL 1980                        Tokelau 220       0        20      NA
## 1636 TKL 1985                        Tokelau 220       0        25      NA
## 1637 TKL 1990                        Tokelau 220       0        30      NA
## 1638 TKL 1995                        Tokelau 220       0        35      NA
## 1639 TKL 2000                        Tokelau 220       0        40      NA
## 1640 TKL 2005                        Tokelau 220       0        45      NA
## 1697 TWN 1960                         Taiwan 228  488697         0      NA
## 1698 TWN 1970                         Taiwan 228  521056        10      NA
## 1699 TWN 1980                         Taiwan 228  546646        20      NA
## 1700 TWN 1985                         Taiwan 228  550372        25      NA
## 1701 TWN 1990                         Taiwan 228  510500        30      NA
## 1702 TWN 1995                         Taiwan 228  525528        35      NA
## 1703 TWN 2000                         Taiwan 228  508990        40      NA
## 1704 TWN 2005                         Taiwan 228  492452        45      NA
## 1801 WLF 1960              Wallis and Futuna 243       0         0      NA
## 1802 WLF 1970              Wallis and Futuna 243       0        10      NA
## 1803 WLF 1980              Wallis and Futuna 243       0        20      NA
## 1804 WLF 1985              Wallis and Futuna 243       0        25      NA
## 1805 WLF 1990              Wallis and Futuna 243       0        30      NA
## 1806 WLF 1995              Wallis and Futuna 243       0        35      NA
## 1807 WLF 2000              Wallis and Futuna 243       0        40      NA
## 1808 WLF 2005              Wallis and Futuna 243       0        45      NA
#fixing counties with no area
aei$area_km[which(aei$ISO == "GUF")] <- 83534 #French Guiana from Britannica
aei$area_km[which(aei$ISO == "GLP")] <- 1630 #Guadeloupe from Britannica 
aei$area_km[which(aei$ISO == "MTQ")] <- 1128 # Martinique from Britannica
aei$area_km[which(aei$ISO == "REU")] <- 2512 #Reunion from Britannica
aei$area_km[which(aei$ISO == "SMK")] <- 13812 + 10905 + 77589 #Montenegro + Kosovo + Serbia from Britannica
aei$area_km[which(aei$ISO == "TWN")] <- 35980 #Taiwan from CIA.gov
aei$area_km[which(aei$ISO == "SDN")] <- 1882000 #Sudan from UNDP

Now, transform the AEI_ha to \(km^2\) and divide to get the % of total country area that is equipped for irrigation. There are still some countries that have 0 AEI_ha and no area_km value. I think I will assign the irrfrac/perc 0 to those countries, but lets wait a little while to do that, most are islands.

aei <- aei %>% mutate(irrperc = ((aei_ha/100)/area_km)*100, irrfrac = ((aei_ha/100)/area_km))

Here is a quick histogram of the target variable. A lot of 0s.

hist(aei$irrperc, breaks = 200)


Chosen Predictor Variables

According to the FAO (Walker 1989), the choice of which irrigation, and by proxy whether to irrigate or not, comes from six factors. The predictor that will be used for this analysis are listed below:

  • Economic

    • \(M_c\) - GDP per Capita - Each country’s GDP per Capita from Gapminder, which collects data from the UN and the Maddison project. This variable will be logged because as stated in Statistical Rethinking (McElreath 2015) McElreath states (pg. 239), “We use the logarithm of it, because the logarithm of GDP is the magnitude of GDP. Since wealth generates wealth, it tends to be exponentially related to anything that increases it”
  • Topographical

    • \(Ru_c\) - Represented here using the “rugged” data set from rethinking, in which a coeff of ruggedness for each country was assigned. This data was taken from Riley, Degloria, and Elliot (1999)
  • Soil Type

    • This will not be included in this analysis. Although, it could be useful there is no good way to quantify soil type on a country level basis. This will be included in the grid cell model, as regions can be easily integrated with the grid cell structure.
  • Water Supply

    • \(R_c\) - Total Precipitation - Translated output from LPJmL. Summed across two dimensions: month and country. In mm/country/year
  • Crop Type

    • Fraction of each crop grown per country - Translated output from LPJmL. Individual crop fraction per grid cell multiplied by respective grid cell area, summed on a country level giving ha of each crop grown per country. This was then divided by total area of the country to achieve the % area occupied by a certain crop per country. Still working out how to work with this predictor.
  • Social Influences

    • \(P_c\) - Total Population - Each country’s total population on a yearly basis. Taken from Gapminder. Again, this will be logged, due to the same argument for GDP, and centered.

All variables will be scaled or centered.

First, the ISO codes and Regions to be able to hold some structure.

#the ISO codes and regional data from Gapminder
iso <- read.csv2("/Volumes/RachelExternal/Thesis/Thesis/Data/GAPminder_ISOcodes.csv")
iso <- iso[, -c(2,7,10,12,13)] #removing extra info

#renaming some cols
iso <- 
  iso %>% 
  rename(ISO = GEO, country = name)

#fixing the variable types
iso$ISO <- as.factor(iso$ISO)
iso$country <- as.factor(iso$country)
iso$four_regions <- as.factor(iso$four_regions)
iso$eight_regions <- as.factor(iso$eight_regions)
iso$six_regions <- as.factor(iso$six_regions)
iso$six_regions <- as.factor(iso$six_regions)
iso$World.bank.region <- as.factor(iso$World.bank.region)
str(iso)
## 'data.frame':    197 obs. of  8 variables:
##  $ ISO              : Factor w/ 197 levels "AFG","AGO","ALB",..: 1 3 50 4 2 8 6 7 9 10 ...
##  $ country          : Factor w/ 197 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ four_regions     : Factor w/ 4 levels "africa","americas",..: 3 4 1 4 1 2 2 4 3 4 ...
##  $ eight_regions    : Factor w/ 8 levels "africa_north",..: 5 7 1 8 2 3 4 7 6 8 ...
##  $ six_regions      : Factor w/ 6 levels "america","east_asia_pacific",..: 5 3 4 3 6 1 1 3 2 3 ...
##  $ Latitude         : num  33 41 28 42.5 -12.5 ...
##  $ Longitude        : num  66 20 3 1.52 18.5 ...
##  $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: 7 3 5 3 8 4 4 3 2 3 ...
#past me helping future me
#fixing Macedonia, FYR to North Macedonia
iso$country <- as.character(iso$country)
iso[iso == "Macedonia, FYR"] <- "North Macedonia"
iso$country <- as.factor(iso$country)

Cool, we have multiple regions here that we could use in the future for some multilevel business. Also, latitude and longitude is here, which we can also try later. Obviously these lat and lon measurements will mean nothing for a country like China or the USA, but they are nice to have.

Income

Income data is coming from Gapminder, as they have put real effort in to filling all the gaps in income and population for the last 150 years. Income is in terms of the 2011 USD and is inflation adjusted.

#income here is 2011 usd 
income <- read.csv("/Volumes/RachelExternal/Thesis/Thesis/Data/Gapminder_income_per_person_gdppercapita_ppp_inflation_adjusted.csv")
income <- income[, c(1, 162:207)] #removing extra years

#making long data
income <- 
  income %>%  
  pivot_longer(!country, names_to = "year", values_to = "income") 

#still have the xs there
income$year <- str_remove_all(income$year, "[X]")
#transformation of the variable types
income$year <- as.numeric(income$year)
income$country <- as.factor(income$country)

#log and standardize
income$log_income <- log(income$income)
income <- income %>% 
  mutate(log_income_std = log_income / mean(log_income))

#a summary
summary(income)
##                 country          year          income         log_income    
##  Afghanistan        :  46   Min.   :1960   Min.   :   312   Min.   : 5.743  
##  Albania            :  46   1st Qu.:1971   1st Qu.:  2250   1st Qu.: 7.719  
##  Algeria            :  46   Median :1982   Median :  5290   Median : 8.574  
##  Andorra            :  46   Mean   :1982   Mean   : 11344   Mean   : 8.635  
##  Angola             :  46   3rd Qu.:1994   3rd Qu.: 13575   3rd Qu.: 9.516  
##  Antigua and Barbuda:  46   Max.   :2005   Max.   :179000   Max.   :12.095  
##  (Other)            :8602                                                   
##  log_income_std  
##  Min.   :0.6651  
##  1st Qu.:0.8939  
##  Median :0.9929  
##  Mean   :1.0000  
##  3rd Qu.:1.1020  
##  Max.   :1.4007  
## 
#past me helping future me again
#fixing Eswatini to Swaziland (even though Eswatini is the correct name)
income$country <- as.character(income$country)
income[income == "Eswatini"] <- "Swaziland"
income$country <- as.factor(income$country)

Logging and standardizing is necessary here. Again, the argument is that wealth begets wealth, so income is logarithmic. From there just center the mean at 0 with a stdev of 1 for standardization.

Population

Population data is also taken from Gapminder. The same process of logging and standardizing will be carried out here.

#population data also taken from Gapminder
population <- read.csv("/Volumes/RachelExternal/Thesis/Thesis/Data/Gapminder_population_total.csv")

#removing extra years
population <- population[, c(1, 162:207)] 
#making longer data
population <- 
  population %>%  
  pivot_longer(!country, names_to = "year", values_to = "population") 

#removing the Xs from the year strings
population$year <- str_remove_all(population$year, "[X]")

#fixing variable types
population$country <- as.factor(population$country)
population$year <- as.numeric(population$year)

#log and standardize, same argument as income: population begets population
population$log_pop <- log(population$population)
population <- population %>% 
  mutate(log_pop_std = log_pop / mean(log_pop))

summary(population)
##                 country          year        population           log_pop      
##  Afghanistan        :  46   Min.   :1960   Min.   :6.450e+02   Min.   : 6.469  
##  Albania            :  46   1st Qu.:1971   1st Qu.:9.045e+05   1st Qu.:13.715  
##  Algeria            :  46   Median :1982   Median :4.545e+06   Median :15.330  
##  Andorra            :  46   Mean   :1982   Mean   :2.404e+07   Mean   :14.988  
##  Angola             :  46   3rd Qu.:1994   3rd Qu.:1.320e+07   3rd Qu.:16.396  
##  Antigua and Barbuda:  46   Max.   :2005   Max.   :1.330e+09   Max.   :21.008  
##  (Other)            :8694                                                      
##   log_pop_std    
##  Min.   :0.4316  
##  1st Qu.:0.9151  
##  Median :1.0228  
##  Mean   :1.0000  
##  3rd Qu.:1.0939  
##  Max.   :1.4017  
## 
#annnd again...
#fixing Eswatini to Swaziland (even though Eswatini is the correct name)
population$country <- as.character(population$country)
population[population == "Eswatini"] <- "Swaziland"
population$country <- as.factor(population$country)

So some issues arise here with the fact that we have different country data for each dataframe. Sometimes the names are capitalized, other times there are spaces instead of underscores. By using anti_join() I can see what won’t be joined. Also this goes both ways, anti_join() returns rows of x that do not have a match in y.

isopop <- anti_join(iso, population, by = "country")
summary(isopop)
##       ISO                country    four_regions            eight_regions
##  HKG    :1   Hong Kong, China:1   africa  :0     east_asia_pacific :2    
##  TWN    :1   Taiwan          :1   americas:0     africa_north      :0    
##  AFG    :0   Afghanistan     :0   asia    :2     africa_sub_saharan:0    
##  AGO    :0   Albania         :0   europe  :0     america_north     :0    
##  ALB    :0   Algeria         :0                  america_south     :0    
##  AND    :0   Andorra         :0                  asia_west         :0    
##  (Other):0   (Other)         :0                  (Other)           :0    
##                    six_regions    Latitude       Longitude    
##  america                 :0    Min.   :22.29   Min.   :114.2  
##  east_asia_pacific       :2    1st Qu.:22.71   1st Qu.:115.9  
##  europe_central_asia     :0    Median :23.14   Median :117.6  
##  middle_east_north_africa:0    Mean   :23.14   Mean   :117.6  
##  south_asia              :0    3rd Qu.:23.57   3rd Qu.:119.3  
##  sub_saharan_africa      :0    Max.   :24.00   Max.   :121.0  
##                                                               
##                   World.bank.region
##  East Asia & Pacific       :2      
##                            :0      
##  Europe & Central Asia     :0      
##  Latin America & Caribbean :0      
##  Middle East & North Africa:0      
##  North America             :0      
##  (Other)                   :0
popiso <- anti_join(population, iso, by = "country")
summary(popiso)
##                 country       year       population     log_pop   
##  Afghanistan        :0   Min.   : NA   Min.   : NA   Min.   : NA  
##  Albania            :0   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
##  Algeria            :0   Median : NA   Median : NA   Median : NA  
##  Andorra            :0   Mean   :NaN   Mean   :NaN   Mean   :NaN  
##  Angola             :0   3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
##  Antigua and Barbuda:0   Max.   : NA   Max.   : NA   Max.   : NA  
##  (Other)            :0                                            
##   log_pop_std 
##  Min.   : NA  
##  1st Qu.: NA  
##  Median : NA  
##  Mean   :NaN  
##  3rd Qu.: NA  
##  Max.   : NA  
## 

Went back up and solved the issue of North Macedonia and Swaziland. There are still issues with Hong Kong and Taiwan as they are part of China. ISO has data for Hong Kong and Taiwan that population does not have. I wont include them in the HTML but they are in the code.

Alright, so I will left join ISO and population then again with income? Not sure that this will solve my problem, but lets try it.

#remove all these rando tables
rm(popinc, inciso, incpop, popiso, isoinc, isopop)

#
iso <- left_join(iso, population, by = "country")
iso <- left_join(iso, income, by = c("country", "year"))

# alright 197 countries! whoop!
str(iso)
## 'data.frame':    8972 obs. of  15 variables:
##  $ ISO              : Factor w/ 197 levels "AFG","AGO","ALB",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ country          : Factor w/ 197 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ four_regions     : Factor w/ 4 levels "africa","americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ eight_regions    : Factor w/ 8 levels "africa_north",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ six_regions      : Factor w/ 6 levels "america","east_asia_pacific",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ Latitude         : num  33 33 33 33 33 33 33 33 33 33 ...
##  $ Longitude        : num  66 66 66 66 66 66 66 66 66 66 ...
##  $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ year             : num  1960 1961 1962 1963 1964 ...
##  $ population       : int  9000000 9170000 9350000 9540000 9740000 9960000 10200000 10400000 10600000 10900000 ...
##  $ log_pop          : num  16 16 16.1 16.1 16.1 ...
##  $ log_pop_std      : num  1.07 1.07 1.07 1.07 1.07 ...
##  $ income           : int  2740 2700 2680 2670 2650 2640 2600 2600 2620 2590 ...
##  $ log_income       : num  7.92 7.9 7.89 7.89 7.88 ...
##  $ log_income_std   : num  0.917 0.915 0.914 0.914 0.913 ...
rm(income, population)

Now, join ISO to AEI with a merge (cause in this case the dynamics of merge are clearer for me). Hoping to preserve all data!

aei <- merge(aei, iso, by = c("ISO","country", "year"), all.x = TRUE)
rm(iso)
str(aei) 
## 'data.frame':    1848 obs. of  21 variables:
##  $ ISO              : Factor w/ 232 levels "","ABW","AFG",..: 2 2 2 2 2 2 2 2 3 3 ...
##  $ country          : Factor w/ 232 levels "Afghanistan",..: 12 12 12 12 12 12 12 12 1 1 ...
##  $ year             : num  1960 1970 1980 1985 1990 ...
##  $ ID               : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ aei_ha           : num  0 0 0 0 0 ...
##  $ yearcount        : num  0 10 20 25 30 35 40 45 0 10 ...
##  $ area_km          : num  180 180 180 180 180 ...
##  $ irrperc          : num  0 0 0 0 0 ...
##  $ irrfrac          : num  0 0 0 0 0 ...
##  $ four_regions     : Factor w/ 4 levels "africa","americas",..: NA NA NA NA NA NA NA NA 3 3 ...
##  $ eight_regions    : Factor w/ 8 levels "africa_north",..: NA NA NA NA NA NA NA NA 5 5 ...
##  $ six_regions      : Factor w/ 6 levels "america","east_asia_pacific",..: NA NA NA NA NA NA NA NA 5 5 ...
##  $ Latitude         : num  NA NA NA NA NA NA NA NA 33 33 ...
##  $ Longitude        : num  NA NA NA NA NA NA NA NA 66 66 ...
##  $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: NA NA NA NA NA NA NA NA 7 7 ...
##  $ population       : int  NA NA NA NA NA NA NA NA 9000000 11200000 ...
##  $ log_pop          : num  NA NA NA NA NA ...
##  $ log_pop_std      : num  NA NA NA NA NA ...
##  $ income           : int  NA NA NA NA NA NA NA NA 2740 2570 ...
##  $ log_income       : num  NA NA NA NA NA ...
##  $ log_income_std   : num  NA NA NA NA NA ...

Solved the issue, 232 factors for both countries and ISO, meaning that we have been using the original AEI file from Siebert as the main data frame to join to. Good. This is the data that needs to be maintained, NAs can pop up in other cols but not irrigated area.

But I still would like total GDP for later so lets transform using population.

aei$income <- as.numeric(aei$income) 

#mutating for total GDP and log GDP
aei <-
  aei %>%
  mutate(GDPtot = income*population) %>%
  mutate(log_gdp_tot = log(GDPtot)) 

#creating the df to standardize without the NAs
aei2 <-
  aei %>% 
  select(ISO, year, log_gdp_tot) %>%
  na.omit() %>% 
  mutate(log_gdp_tot_std = log_gdp_tot/mean(log_gdp_tot)) %>% 
  select(ISO, year, log_gdp_tot_std)

#now rejoin!
aei <- 
  left_join(aei, aei2, by = c("ISO", "year"))

str(aei)
## 'data.frame':    1848 obs. of  24 variables:
##  $ ISO              : Factor w/ 232 levels "","ABW","AFG",..: 2 2 2 2 2 2 2 2 3 3 ...
##  $ country          : Factor w/ 232 levels "Afghanistan",..: 12 12 12 12 12 12 12 12 1 1 ...
##  $ year             : num  1960 1970 1980 1985 1990 ...
##  $ ID               : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ aei_ha           : num  0 0 0 0 0 ...
##  $ yearcount        : num  0 10 20 25 30 35 40 45 0 10 ...
##  $ area_km          : num  180 180 180 180 180 ...
##  $ irrperc          : num  0 0 0 0 0 ...
##  $ irrfrac          : num  0 0 0 0 0 ...
##  $ four_regions     : Factor w/ 4 levels "africa","americas",..: NA NA NA NA NA NA NA NA 3 3 ...
##  $ eight_regions    : Factor w/ 8 levels "africa_north",..: NA NA NA NA NA NA NA NA 5 5 ...
##  $ six_regions      : Factor w/ 6 levels "america","east_asia_pacific",..: NA NA NA NA NA NA NA NA 5 5 ...
##  $ Latitude         : num  NA NA NA NA NA NA NA NA 33 33 ...
##  $ Longitude        : num  NA NA NA NA NA NA NA NA 66 66 ...
##  $ World.bank.region: Factor w/ 8 levels "","East Asia & Pacific",..: NA NA NA NA NA NA NA NA 7 7 ...
##  $ population       : int  NA NA NA NA NA NA NA NA 9000000 11200000 ...
##  $ log_pop          : num  NA NA NA NA NA ...
##  $ log_pop_std      : num  NA NA NA NA NA ...
##  $ income           : num  NA NA NA NA NA NA NA NA 2740 2570 ...
##  $ log_income       : num  NA NA NA NA NA ...
##  $ log_income_std   : num  NA NA NA NA NA ...
##  $ GDPtot           : num  NA NA NA NA NA ...
##  $ log_gdp_tot      : num  NA NA NA NA NA ...
##  $ log_gdp_tot_std  : num  NA NA NA NA NA ...

Finally, lets check out a quick histogram of GDP and pop,

hist(aei$log_gdp_tot_std, breaks = 100)

hist(aei$log_pop_std, breaks = 100)

Dealing with Some Issues

There are a lot of countries that have no data for the regions, but have non zero irrperc/frac which I will use in further analysis, so we need to replace the regions.

#view which countries don't have regions but have non zero irrprec
na_regions <- aei[is.na(aei$four_regions),]

#Aruba
aei <- aei %>%
  mutate(four_regions = replace(four_regions, ISO == 'ABW', 'americas'), 
         six_regions = replace(six_regions, ISO == 'ABW', 'america'), 
         eight_regions = replace(eight_regions, ISO == 'ABW', 'america_north')) %>%
#american samoa
  mutate(four_regions = replace(four_regions,ISO == 'ASM', 'asia'), 
         six_regions = replace(six_regions,ISO == 'ASM', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'ASM', 'east_asia_pacific')) %>%
#bermuda
  mutate(four_regions = replace(four_regions,ISO == 'BMU', 'americas'), 
         six_regions = replace(six_regions,ISO == 'BMU', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'BMU', 'america_north')) %>% 
#cote d'ivoire
  mutate(four_regions = replace(four_regions,ISO == 'CIV', 'africa'), 
         six_regions = replace(six_regions,ISO == 'CIV', 'sub_saharan_africa'), 
         eight_regions = replace(eight_regions,ISO == 'CIV', 'africa_sub_saharan')) %>% 
#democratic republic of congo
  mutate(four_regions = replace(four_regions,ISO == 'COD', 'africa'), 
         six_regions = replace(six_regions,ISO == 'COD', 'sub_saharan_africa'), 
         eight_regions = replace(eight_regions,ISO == 'COD', 'africa_sub_saharan')) %>% 
#republic of congo
  mutate(four_regions = replace(four_regions,ISO == 'COG', 'africa'), 
         six_regions = replace(six_regions,ISO == 'COG', 'sub_saharan_africa'), 
         eight_regions = replace(eight_regions,ISO == 'COG', 'africa_sub_saharan')) %>% 
#cayman islands
  mutate(four_regions = replace(four_regions,ISO == 'CYM', 'americas'), 
         six_regions = replace(six_regions,ISO == 'CYM', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'CYM', 'america_north')) %>%
#Faroe islands
  mutate(four_regions = replace(four_regions,ISO == 'FRO', 'europe'), 
         six_regions = replace(six_regions,ISO == 'FRO', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'FRO', 'europe_west')) %>%
#micronesia
  mutate(four_regions = replace(four_regions,ISO == 'FSM', 'asia'), 
         six_regions = replace(six_regions,ISO == 'FSM', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'FSM', 'east_asia_pacific')) %>%
#gibraltar
  mutate(four_regions = replace(four_regions,ISO == 'GIB', 'europe'), 
         six_regions = replace(six_regions,ISO == 'GIB', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'GIB', 'europe_west')) %>%
#Guadeloupe
  mutate(four_regions = replace(four_regions,ISO == 'GLP', 'americas'), 
         six_regions = replace(six_regions,ISO == 'GLP', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'GLP', 'america_north')) %>%
#greenland
  mutate(four_regions = replace(four_regions,ISO == 'GRL', 'europe'), 
         six_regions = replace(six_regions,ISO == 'GRL', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'GRL', 'europe_west')) %>%
#french Guiana 
  mutate(four_regions = replace(four_regions,ISO == 'GUF', 'americas'), 
         six_regions = replace(six_regions,ISO == 'GUF', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'GUF', 'america_south')) %>%
#guam
  mutate(four_regions = replace(four_regions,ISO == 'GUM', 'asia'), 
         six_regions = replace(six_regions,ISO == 'GUM', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'GUM', 'east_asia_pacific')) %>%
#Kyrgyzstan
  mutate(four_regions = replace(four_regions,ISO == 'KGZ', 'asia'), 
         six_regions = replace(six_regions,ISO == 'KGZ', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'KGZ', 'asia_west')) %>%
#laos
  mutate(four_regions = replace(four_regions,ISO == 'LAO', 'asia'), 
         six_regions = replace(six_regions,ISO == 'LAO', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'LAO', 'east_asia_pacific')) %>%
#Macedonia
  mutate(four_regions = replace(four_regions,ISO == 'MKD', 'europe'), 
         six_regions = replace(six_regions,ISO == 'MKD', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'MKD', 'europe_east')) %>%
#Martinique 
  mutate(four_regions = replace(four_regions,ISO == 'MTQ', 'americas'), 
         six_regions = replace(six_regions,ISO == 'MTQ', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'MTQ', 'america_north')) %>%
#new caledonia
  mutate(four_regions = replace(four_regions,ISO == 'NCL', 'asia'), 
         six_regions = replace(six_regions,ISO == 'NCL', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'NCL', 'east_asia_pacific')) %>%
#puerto rico
 mutate(four_regions = replace(four_regions,ISO == 'PRI', 'americas'), 
         six_regions = replace(six_regions,ISO == 'PRI', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'PRI', 'america_north')) %>%
#Palestine (West Bank + Gaza strip)
 mutate(four_regions = replace(four_regions,ISO == 'PSE', 'asia'), 
         six_regions = replace(six_regions,ISO == 'PSE', 'middle_east_north_africa'), 
         eight_regions = replace(eight_regions,ISO == 'PSE', 'asia_west')) %>%
#french Polynesia
  mutate(four_regions = replace(four_regions,ISO == 'PYF', 'asia'), 
         six_regions = replace(six_regions,ISO == 'PYF', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'PYF', 'east_asia_pacific')) %>%
#Reunion
  mutate(four_regions = replace(four_regions,ISO == 'REU', 'africa'), 
         six_regions = replace(six_regions,ISO == 'REU', 'sub_saharan_africa'), 
         eight_regions = replace(eight_regions,ISO == 'REU', 'africa_sub_saharan')) %>%
#Serbia, Montenegro, Kosovo
  mutate(four_regions = replace(four_regions,ISO == 'SMK', 'europe'), 
         six_regions = replace(six_regions,ISO == 'SMK', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'SMK', 'europe_east')) %>%
#Slovakia
  mutate(four_regions = replace(four_regions,ISO == 'SVK', 'europe'), 
         six_regions = replace(six_regions,ISO == 'SVK', 'europe_central_asia'), 
         eight_regions = replace(eight_regions,ISO == 'SVK', 'europe_east')) %>%
#Turks and Caicos Islands
  mutate(four_regions = replace(four_regions,ISO == 'TCA', 'americas'), 
         six_regions = replace(six_regions,ISO == 'TCA', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'TCA', 'america_north')) %>%
#East Timor
  mutate(four_regions = replace(four_regions,ISO == 'TLS', 'asia'), 
         six_regions = replace(six_regions,ISO == 'TLS', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'TLS', 'east_asia_pacific')) %>%
#taiwan
  mutate(four_regions = replace(four_regions,ISO == 'TWN', 'asia'), 
         six_regions = replace(six_regions,ISO == 'TWN', 'east_asia_pacific'), 
         eight_regions = replace(eight_regions,ISO == 'TWN', 'east_asia_pacific')) %>%
#British virgin Islands
  mutate(four_regions = replace(four_regions,ISO == 'VGB', 'americas'), 
         six_regions = replace(six_regions,ISO == 'VGB', 'america'), 
         eight_regions = replace(eight_regions,ISO == 'VGB', 'america_north')) %>%
#US Virgin Islands
  mutate(four_regions = replace(four_regions,ISO == 'VIR', 'americas'), 
         six_regions = replace(six_regions,ISO == 'VIR', 'america'), 
         eight_regions = replace(eight_regions, ISO == 'VIR', 'america_north')) 

Next time I’ll write a function.

There are also a lot of countries that have NA values for their income or population. Take a peek here.

lotanas <-  aei[rowSums(is.na(aei)) > 0,]
lotanas %>% 
  group_by(country) %>% 
  select(income, population) %>%  
  summarise_all(funs(sum(is.na(.))))
## # A tibble: 51 x 3
##    country                        income population
##    <fct>                           <int>      <int>
##  1 American Samoa                      8          8
##  2 Anguilla                            8          8
##  3 Antarctica                          8          8
##  4 Aruba                               8          8
##  5 Bermuda                             8          8
##  6 Bouvet Island                       8          8
##  7 British Indian Ocean Territory      8          8
##  8 British Virgin Islands              8          8
##  9 Cayman Islands                      8          8
## 10 Christmas Island                    8          8
## # … with 41 more rows

The countries that don’t have income also don’t have population, so as least I am consistent in my missing data. I have to move on so I will just leave this here for a minute. Perhaps later we should fill these variables.

Precipitation

Again, precipitation is translated output from LPJmL. Precipitation was chosen because it could be summed across countries, as other data could not (like discharge or ET). I have summed the precipitation by country and year. I am not sure whether I should scale or standardize precip, I will do all the possible combinations at the moment.

#precipitation
#cftandprecip is a script with all the grid cell values. 
cftandprecip <- read.csv('/Volumes/RachelExternal/Thesis/Thesis/Data/cft_and_precipdata.csv')
cftandprecip$ISO <- as.factor(cftandprecip$ISO)
cftandprecip <- cftandprecip[, -c(1)] #remove id col
cftandprecip <- subset(cftandprecip, year >= 1960)

#log and scale
precip <- 
  cftandprecip %>% 
  select(ISO, year, precipmm) %>% 
  group_by(ISO, year) %>% 
  summarise(preciptot = (sum(precipmm))) %>% #sum grid cells by ISO and year
  mutate(precip_sc = preciptot/max(preciptot)) %>% 
  mutate(precip_std = preciptot/mean(preciptot)) %>% 
  mutate(log_precip = log(preciptot+1)) %>% #1 is added here to prevent -Inf
  mutate(log_precip_sc = log_precip/max(log_precip)) %>% 
  mutate(log_precip_std = log_precip/mean(log_precip))

#merge
aei <- merge(aei, precip, by = c("ISO", "year"), all.x = TRUE)
rm(precip)

And some histograms to check out the scaled/standardized data.

hist(aei$preciptot, breaks = 100)

hist(aei$precip_sc, breaks = 100)

hist(aei$precip_std, breaks = 100)

hist(aei$log_precip, breaks = 100)

hist(aei$log_precip_sc, breaks = 100)

hist(aei$log_precip_std, breaks = 100)

Crop Fraction

Uff. Then comes crop fraction. As mentioned above individual crop fraction per grid cell is multiplied by respective grid cell area, summed on a country level giving ha of each crop grown per country. This was then divided by total area of the country to achieve the % area occupied by a certain crop per country.

#pull cft frac outta here
cft <- 
  cftandprecip %>% 
  select(-c(4)) %>% #removing precip
  mutate_each(funs(.*cellaream2), c(4:35)) %>% #multiply all fractions with the cell area
  group_by(ISO, year) %>% #group in to country and year
  summarise_each(funs(sum)) %>% #take col sums for each crop
  mutate(across(c(3:34), .fns = ~./cellaream2)) #divide col sums by total country area 

#removing categories with 0 (the biofuels and such)
cft <- cft[, -c(18, 19, 26, 34, 35)]

#and a merge
aei <- merge(aei, cft, by = c("ISO", "year"), all.x = TRUE)
rm(cft, cftandprecip)

Idk how to show the data here. Any ideas on representation would be helpful here.

Topographical

Topographical data is represented here by the data set from Riley, Degloria, and Elliot (1999) and is conviently available through the library(rethinking) from McElreath (2015). It provides a ruggedness factor for each country.

#rugged
data("rugged")
rug <- rugged %>% select(c("isocode", "rugged")) 
colnames(rug) <- c("ISO", "rugged")
#scaling
rug$rugged_sc <- rug$rugged / max(rug$rugged)
#merge
aei <- merge(aei, rug, by = "ISO", all.x = TRUE)
rm(rug, rugged)
hist(aei$rugged_sc, breaks = 100)


Time Series Evolution of Country Level Target Variable

Ok, so lets peek at how things change over time. Here is a graph of percentage of country area equipped for irrigation from 1960 - 2005.

pirrfrac <-
  ggplot(data = subset(aei, !is.na(irrperc)), 
         mapping = aes(x=year, y= irrperc, group = country, color = six_regions)) +
  geom_line() +
  scale_x_continuous() +
  theme(panel.grid = element_blank()) 

#run this separately, otherwise the plotly part doesn't work.
ggplotly(pirrfrac)

Although this graph has a lot of countries and can get quite messy, some general trends appear. South Asian countries seem to have high percentages of irrigated area per total land area (irrigation percentage). Then seemingly followed by East Asian and European countries. Africa and the Middle East do not seem to be irrigated as much. Be aware that you can these graphs are interactive, you may zoom, pan or click on the legend to isolate the regions. Although, a regional breakdown follows.

Sub-Saharan Africa

In general, Africa has low levels of irrigated area. Swaziland, Sao Tome and Principe, Reunion and Mauritius lead here with majority of others falling below the levels of these countries.

Europe and Centrtal Asia

Europe contains some countries that reach higher percentages of irrigated area. Countries such as the Netherlands, Romania, Albania, and Azerbaijan have reached peaks in the past (roughly 10-15% irrigated area) but seem to be in a decline towards the end our study period. Other countries such as Italy and Denmark have had more stable rises with less down turn toward. Unexpectedly, Russia has a very low percentage of irrigated area, however at this point, the assumption is that either Russia’s enormous size cancels out its irrigated area, or that Russia receives enough precipitation for the crops it grows that irrigation is not necessary. Further analysis should reveal this.

Americas

In the Americas, the % of area irrigated by countries is lower than Asia or Europe. Peaks here are Cuba and the Dominican Republic at around roughly 6-8% of their total countries being irrigated. There are some islands (Barbados and St. Lucia) that have sharp increases in irrigation fraction, but this could be because they are islands and any irrigation at all would induce the ration of irrigated land to total land spike. Counties that have experienced a steady increase, but are not the leaders in overall irrigation fraction are countries that are big and produce a lot of produce (USA, Mexico, Chile) The general trends of irrigation increase seem to be upward with perhaps a slight leveling off toward the end of the study period (for some countries) but lacking the distinct downturn that is seen in Europe.

East Asia and Oceania

The East Asian countries are quite high in % of irrigated area, when compared to the Americas or Africa. The four countries that reach upwards of (roughly) 10% or more of their total area irrigated toward the end of the study period are Vietnam, North Korea, South Korea, and Thailand. Other countries that have high irrfracs are China, Japan and the Philippines. It is expected that for this region the irrigation fraction would be high due to the amount of rice produced.

Oceania, due to its large amount of islands, does not have a lot of irrigation, majority of countries have 0% irrigation. New Zealand (around 2%) and Australia (less than 1%) have some irrigation. Although this could have a multitude of reasons why irrigation is so low here, regardless of the fact that Australia and New Zealand are very populous countries with mouths to feed. Australia is very large and perhaps its enormity cancels out the irrigated land it does have.

South Asia

Very few countries here, countries in this area are large. Very high levels of irrigation here coming from most countries. Bangladesh, Pakistan, and India are heavily irrigated compared to their land area.

Middle East and North Africa

Irrigation is also prevalent here (but not as much as in south Asia). These countries are dry and therefore would need irrigation to have domestic food supplies. Lebanon, Iraq and Israel lead here. Oil producing countries have little irrigation, they are dry and their GDP is high, perhaps importation is the easiest option for them?


Overall, this might be a good overview of what irrperc looks like for each region.

theme_set(theme_minimal())

ggplot(subset(aei, !is.na(irrperc)),
       aes(x=six_regions, y=irrperc, fill=six_regions)) + 
  geom_boxplot(show.legend = FALSE)  + 
  coord_flip() +
  theme(legend.title = element_blank())


References

McElreath, Richard. 2015. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC press. https://xcelab.net/rm/statistical-rethinking/.
Riley, Shawn, Stephen Degloria, and S. D. Elliot. 1999. “A Terrain Ruggedness Index That Quantifies Topographic Heterogeneity.” Internation Journal of Science 5 (January): 23–27.
Siebert, Stefan, Matti Kummu, Miina Porkka, Petra Doell, Navin Ramankutty, and Bridget Scanlon. 2015. “A Global Data Set of the Extent of Irrigated Land from 1900 to 2005.” Hydrology and Earth System Sciences 19 (March): 1521–45. https://doi.org/10.5194/hess-19-1521-2015.
Walker, W. R. 1989. FAO Irrigation and Drainage Paper 45.” 1. Vol. 45. Rome.